Decennial Census
American Community Survey (ACS)
03/29/2021
Welcome! While we’re waiting:
Clone or download the workshop files from: https://github.com/dlab-berkeley/rCensus_workshop
Open RStudio
Open a new R script file
About me
About you
Describe primary Census data products
Introduce R packages for working with Census Data
Use those packages to fetch census data
Use those packages to fetch census data plus census geograpic boundary files
Make maps of census data
The “nation’s leading provider of quality data about its people and economy.”
Available at www.census.gov
Decennial Census
American Community Survey (ACS)
Complete count of the population every 10 years since 1790
Includes data on
population, by age & race/ethnicity
housing, by occupancy & tenure (owned, rented)
From 1840 - 2000, additional questions were asked of a sample of the population.
Since 2005 those sample questions now comprise the American Community Survey (ACS).
Annual survey of a sample of about 3.5 million households
Provides estimates of demographic, social, economic & housing characteristics
Includes margin of error values for the estimates.
| Demographic* | Social | Economic | Housing |
|---|---|---|---|
| Sex | Families | Income | Tenure* |
| Age | Education | Benefits | Occupancy* |
| Race | Marital Status | Employment Status | Structure Type |
| Hispanic Origin | Fertility | Occupation | Housing Value |
| Grandparents | Industry | Taxes & Insurance | |
| Veterans | Commuting | Utilities | |
| Disability Status | Place of Work | Mortgage | |
| Language at Home | Health Insurance | Monthly Rent | |
| Citizenship | |||
| Mobility |
Census microdata (data collected from individuals) are publicly available at one or more levels of geographic aggregation. Not all data tables are available all geographies, e.g., only decennial data census are available at the block level.
ACS 1 year and 5 year products are currently available through 2019
ACS 3 year no longer available (2008 - 2013)
ACS 5 year data provides much better estimates, lower margins of error
Identify your
Then determine what specific tables and variables are available
“If you want to measure change you can’t change the measures!”
Census tables, variables, geographies, and geographic boundaries change over time!
Measuring change over time with census data is its own thing, complex and not covered by this workshop!
You can download Census data directly from:
You can download Census geographic data directly on the census website or from NHGIS.
You can write code to fetch data from the Census Web APIs
API: application programming interface
Web API: URLs can be formatted to make queries that return data
Or you can leverage an existing R package to make this easier!
Only a subset of recent Census data products are available via APIs
These are the ones we recommend and will use today.
An R package with functions that make it easier to fetch decennial census and ACS data from the Census APIs.
Limited available from Census
tidycensus functionsactively maintained and expanding to include more census data products (see tidycensus website)
tidycensus requires you to first get a Census API key
Provides access to Census geographic data files
Also provides access to additional geographic data,
Used by tidycensus to access state, county, tract, block group, block, and ZCTA boundaries.
Packages developed by Kyle Walker to make it easier to fetch data from Census websites and APIs in R and get that data in a useable format to analyze, plot, and map.
Check out his website to keep abreast of his great packages, blog posts, and tutorials.
Walker also develped a new DataCamp course: Analyzing US Census Data in R!
A collection of R Packages for data science, developed primarily by Hadley Wickham, Chief Scientist at RStudio, including:
dplyr and tidyr for reshaping data
ggplot2 for plotting
purr, readr and tibble for improved performance
These packages and more are used by tidyverse under the hood.
Simple features for geospatial data objects and methods.
sp packagesf includes the functionality of the sp, rgdal, rgeos and proj4 packages.
sf is loaded and used automatically by tidycensus
mapview provides functions for quickly and easily create interactive mapping visualizations.
We will work through several exercises using tidycensus to fetch, wrangle and map census data.
Install any packages we will use that are not installed already. If you installed any of these awhile ago it’s a good idea to install updates!
# A list of the packages we will use
list_of_packages <- c("tidyverse","tidycensus","tigris","sf","mapview")
# identify the ones we need to install
new_packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
# install any that are not installed (new_packages)
if(length(new_packages) > 0) {
print(paste("Installing these packages:", new_packages))
install.packages(new_packages)
} else {
print("All packages already installed!")
}
Load the packages we will use today
library(tidycensus) library(tidyverse) library(tigris) library(sf) library(mapview)
If you are getting errors try importing dplyr or reinstalling dplyr package as that has worked for some.
You need a census API key to programmatically fetch census data.
Get it here (pretty quick):
For more info see:
Use the tidycensus function census_api_key to register your API key with tidycensus
# Install your census api key - long alphanumeric string
census_api_key("THE_BIG_LONG_ALPHANUMERIC_API_KEY_YOU_GOT_FROM_CENSUS")
I keep my key in a file so no one can see it
# source (run) an r script that creates a variable with my key
source("/Users/pattyf/Documents/Dlab/workshops/keys/census_api_key.R")
#register the key
census_api_key(my_census_api_key)
## To install your API key for use in future sessions, run this function with `install = TRUE`.
Be sure to Clone or downloaded & unzip the workshop files from: https://github.com/dlab-berkeley/rCensus_workshop
Then, set your working directory the repo folder, e.g.,
setwd("~/Documents/Dlab/workshops/2021/rCensus_workshop")OR
Let’s start by fetching population data from the 2010 Census for all states
In order to fetch census data you need to identify the census variables that contain the data of interest.
Census data variables are organized in tables
Which are organized by topic or concept.
The tidycensus load_variables function can help with this step.
First, take a look at the function documentation.
?load_variables
Use load_variables to fetch all variables used in the 2010 census into a dataframe.
vars2010 <- load_variables(year=2010, # Year or end year for ACS-5yr
dataset = 'sf1', # 'sf1' for decennial or 'acs5', etc
cache = TRUE) # Whether to save fetched data locally
Let’s take a look at and discuss the resultant dataframe.
View(vars2010)
Topics: Population, housing
3,346 Variables: 3,346
333 Tables - that’s a lot!
https://www.census.gov/data/datasets/2010/dec/summary-file-1.html
We can sort and filter the vars2010 dataframe to find it.
We can use the tidycensus function get_decenial to fetch the 2010 census data for total population by state.
First, check the documentation for the function.
?get_decennial
Fetch total population by state (P001001) from the 2010 census using get_decennial.
pop2010 <- get_decennial(geography = "state", # census tabulation unit
variables = "P001001", # variable(s) of interest
year = 2010) # census year
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
How many rows and columns?
Do you see the expected number of states?
What column contains the population counts?
Do the data values see to be right?
head(pop2010) tail(pop2010)
head(pop2010)
## # A tibble: 6 x 4 ## GEOID NAME variable value ## <chr> <chr> <chr> <dbl> ## 1 01 Alabama P001001 4779736 ## 2 02 Alaska P001001 710231 ## 3 04 Arizona P001001 6392017 ## 4 05 Arkansas P001001 2915918 ## 5 06 California P001001 37253956 ## 6 22 Louisiana P001001 4533372
tail(pop2010)
## # A tibble: 6 x 4 ## GEOID NAME variable value ## <chr> <chr> <chr> <dbl> ## 1 51 Virginia P001001 8001024 ## 2 53 Washington P001001 6724540 ## 3 54 West Virginia P001001 1852994 ## 4 55 Wisconsin P001001 5686986 ## 5 56 Wyoming P001001 563626 ## 6 72 Puerto Rico P001001 3725789
We can visualize the data to get a quick overview of the distribution of data values.
It’s a first step in exploratory data analysis and a last step in data communication.
ggplot2 is the most commonly used R package for data visualization.
tidyverse package.Let’s use it to visualize the population data.
Use ggplot2 to create an ordered horizontal bar chart.
pop_plot<- ggplot(data=pop2010, aes(x=reorder(NAME,value), y=value/1000000)) +
geom_bar(stat="identity") + coord_flip() +
theme_minimal() +
labs(title = "2010 US Population by State") +
xlab("State") +
ylab("in millions")
Fetch total population data by state from the 2000 decennial census.
Don’t assume variable names are the same across years.
Check first by loading the 2000 variables into a dataframe.
Total Population in 2000
# What is the variable name in 2000? vars2000 <- load_variables(year=2000, dataset = 'sf1', cache = T) # Take a look and search in the dataframe View(vars2000) # Fetch the 2000 pop data pop2000 <- get_decennial(geography = "state", variables = "P001001", year = 2000) # Take a look View(pop2000)
In the previous example we retrieved population data for all states.
This is the default behavior if you don’t specify a subset.
But you can limit the data to be retrieved by subunits like state.
Let’s fetch data for just 3 states.
state_pop2010 <- get_decennial(geography = "state", # census tabulation unit
variables = "P001001", # variables of interest
year = 2010, # census year
state=c("CA","OR","WA")) # Filter by states of interest
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
Note we are referencing states by their abbreviation.
state_pop2010
## # A tibble: 3 x 4 ## GEOID NAME variable value ## <chr> <chr> <chr> <dbl> ## 1 06 California P001001 37253956 ## 2 41 Oregon P001001 3831074 ## 3 53 Washington P001001 6724540
get_decennial accepts a number of different values for tabulation unit.
state, county, tract, block group, block, and ZCTA.Let’s change the tabulation unit from state to county.
county_pop2010 <- get_decennial(geography = "county", # census tabulation unit
variables = "P001001", # variable(s) of interest
year = 2010) # data year - only one!
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
View the county data to see what was retrieved.
View(county_pop2010)
Try it before you look ahead at solutions.
## Fetch population by **county** for Oregon & California
county_pop2010_ca_and_or <- get_decennial(geography = "county", # census tabulation unit
variables = "P001001", # variables of interest
year = 2010,
state=c('CA','OR'))
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
#head(county_pop2010_ca_and_or)
Census tracts are the most commonly used census tabulation unit.
Let’s fetch population data for the census tabulation unit to tract
Fetch total population for all states by census tract
## Fetch population by **tract** for all states.
pop2010_tracts <- get_decennial(geography = "tract", # census tabulation unit
variables = "P001001", # variables of interest
year = 2010)
Fetch total population for California by census tract
## Fetch population by **tract** for California.
cal_pop2010_tracts <- get_decennial(geography = "tract", # census tabulation unit
variables = "P001001", # variables of interest
year = 2010,
state=c('CA')) # State filter
If you want census data at the tract level or below you must specifiy the state(s)
tract_pop2010 <- get_decennial(geography = "tract", # census tabulation unit
variables = "P001001", # variable of interest
year = 2010, # census year - only one!
state="CA", # limit to California
county=c("Alameda","Contra Costa")) # & these counties
## Getting data from the 2010 decennial Census
View the results! How many census tracts are in these 3 counties?
dim(tract_pop2010) View(tract_pop2010)
You can use names, abbreviations or FIPS codes for your state and county.
# County FIPS Codes for
# Alameda, SF, Contra Costa, Marin County, Napa,
# San Mateo, Santa Clara, Solano, Sonoma, santa cruz
nine_counties <- c("001", "075", "013", "041", "055", "081", "085", "095", "097")
# Fetch population by **tract** for the nine county Bay Area
bayarea_pop2010_tract <- get_decennial(geography = "tract", # census tabulation unit
variables = "P001001", # variable of interest
year = 2010, # census year
state="CA", # limit to state of California
county=nine_counties) # and only these counties
# View results
# View(bayarea_pop2010_tract)
What three things are new here?
#urban and rural pop for 3 CA counties
ur_pop10 <- get_decennial(geography = "county", # census tabulation unit
variables = c(urban="P002002",rural="P002005"),
year = 2010,
summary_var = "P002001", # The denominator
state='CA',
county=c("Napa","Sonoma","Mendocino"))
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
variables = c("P002002","P002005")
variables = c(urban="P002002",rural="P002005")
summary_var (a denominator - here, the total count of all people or households surveyed. Can be used for calcuations like percent of total.)summary_var = "P002001"
ur_pop10
## # A tibble: 6 x 5 ## GEOID NAME variable value summary_value ## <chr> <chr> <chr> <dbl> <dbl> ## 1 06045 Mendocino County, California urban 48110 87841 ## 2 06055 Napa County, California urban 118194 136484 ## 3 06097 Sonoma County, California urban 424102 483878 ## 4 06045 Mendocino County, California rural 39731 87841 ## 5 06055 Napa County, California rural 18290 136484 ## 6 06097 Sonoma County, California rural 59776 483878
The summary_value column comes in handy when you want to compute percent of total, for example:
# Calculate the percent of population that is Urban or Rural
ur_pop10 <- ur_pop10 %>%
mutate(pct = 100 * (value / summary_value))
Let’s take a look at the output
ur_pop10 # Take a look
## # A tibble: 6 x 6 ## GEOID NAME variable value summary_value pct ## <chr> <chr> <chr> <dbl> <dbl> <dbl> ## 1 06045 Mendocino County, California urban 48110 87841 54.8 ## 2 06055 Napa County, California urban 118194 136484 86.6 ## 3 06097 Sonoma County, California urban 424102 483878 87.6 ## 4 06045 Mendocino County, California rural 39731 87841 45.2 ## 5 06055 Napa County, California rural 18290 136484 13.4 ## 6 06097 Sonoma County, California rural 59776 483878 12.4
Plots give us compact visual summaries of the data
myplot <- ggplot(data = ur_pop10,
mapping = aes(x = NAME, fill = variable,
y = ifelse(test = variable == "urban",
yes = -pct, no = pct))) +
geom_bar(stat = "identity") +
scale_y_continuous(labels = abs, limits=c(-100,100)) +
labs(title="Urban & Rural Population in Wine Country",
x="County", y = " Percent of Population", fill="") +
coord_flip()
Don’t worry if you don’t get all the ggplot code now. It’s here for reference.
myplot
This is often helpful but you need to keep tract of the meaning of each variable.
alco_pop10 <- get_decennial(geography = "tract", # Census tabulation unit
table = "P002", # Table of urban & rural population counts
year = 2010, # Decennial census year
state='CA', # Filter state
county="Alameda") # Filter county
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
unique(alco_pop10$variable) # What and how many unique vars in table?
## [1] "P002001" "P002002" "P002003" "P002004" "P002005" "P002006"
head(alco_pop10,3) # Take a look at output
## # A tibble: 3 x 4 ## GEOID NAME variable value ## <chr> <chr> <chr> <dbl> ## 1 06001400100 Census Tract 4001, Alameda County, California P002001 2937 ## 2 06001400200 Census Tract 4002, Alameda County, California P002001 1974 ## 3 06001400300 Census Tract 4003, Alameda County, California P002001 4865
Let’s try all three of these commands and then look at the ouput to see what’s different?
get_decennial(geography = "state", variables = "P001001",
year = 2010)
get_decennial(geography = "state", variables = c(pop10="P001001"),
year = 2010)
get_decennial(geography = "state", variables = c(pop10="P001001"),
year = 2010, output="wide")
Your R skills can help you reformat the data and make it more useable.
Let’s fetch population data for 2010 & 2000 by state with output=wide.
Then we will combine these into one data frame.
Fetch pop by state from both the 2000 and 2010 census
pop2000 <- get_decennial(geography = "state",
variables = c(pop00="P001001"),
year = 2000, output="wide")
## Getting data from the 2000 decennial Census
## Using Census Summary File 1
pop2010 <- get_decennial(geography = "state",
variables = c(pop10="P001001"),
year = 2010, output="wide")
## Getting data from the 2010 decennial Census ## Using Census Summary File 1
What column(s) can we use to merge these two dataframes?
head(pop2000, 3)
## # A tibble: 3 x 3 ## GEOID NAME pop00 ## <chr> <chr> <dbl> ## 1 01 Alabama 4447100 ## 2 02 Alaska 626932 ## 3 04 Arizona 5130632
head(pop2010, 3)
## # A tibble: 3 x 3 ## GEOID NAME pop10 ## <chr> <chr> <dbl> ## 1 01 Alabama 4779736 ## 2 02 Alaska 710231 ## 3 04 Arizona 6392017
Save in a new dataframe with both columns
pop2000_2010 <- pop2000 %>% merge(pop2010, by="NAME") %>%
select(NAME, pop00, pop10)
head(pop2000_2010,3)
## NAME pop00 pop10 ## 1 Alabama 4447100 4779736 ## 2 Alaska 626932 710231 ## 3 Arizona 5130632 6392017
Use write.csv to save a data frame to a CSV file.
write.csv(pop2000_2010, file="data_out/pop2000_2010.csv", row.names = FALSE)
tidycensusYou can fetch geographic data by adding the parameter geometry=TRUE to tidycensus functions
Under the hood, tidycensus calls the tigris package to fetch data from the Census Geographic Data APIs.
Only a subset of data available via tigris can be accessed via tidycensus.
You can then use your favorite mapping functions or libraries like plot, ggplot and tmap to make maps.
Before fetching census geographic data, we need to set the option tigris_use_cache to TRUE
Caching greatly speeds things up if you fetch the same census geographic data repeatedly.
# Tigris options - used by tidycensus # Cache retrieved geographic data locally options(tigris_use_cache = TRUE)
We fetch the geospatial data by setting geometry=TRUE.
pop2010geo <- get_decennial(geography = "state",
variables = c(pop10="P001001"),
year = 2010,
output="wide",
geometry=TRUE) # Fetch geometry with the data for mapping
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
Let’s take a minute to discuss the format of an sf spatial object.
head(pop2010geo, 3)
## Simple feature collection with 3 features and 3 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -90.41814 ymin: 41.23796 xmax: -66.9499 ymax: 48.19097 ## geographic CRS: NAD83 ## # A tibble: 3 x 4 ## GEOID NAME pop10 geometry ## <chr> <chr> <dbl> <MULTIPOLYGON [°]> ## 1 23 Maine 1.33e6 (((-67.61976 44.51975, -67.61541 44.52197, -67.58774 … ## 2 25 Massachus… 6.55e6 (((-70.83204 41.6065, -70.82373 41.59857, -70.82092 4… ## 3 26 Michigan 9.88e6 (((-88.68443 48.11578, -88.67563 48.12044, -88.67639 …
R sf objects include
a dataframe with a geometry column named of geometry
a CRS (coordinate reference system), specified by
For a deeper understanding of the sf package and its functionality, we recommend our Geospatial-Fundamentals-in-R-with-sf workshop.
All census geographic data use the NAD83 CRS, or coordinate reference system. NAD83 stands for North American Datum of 1983. The geographic coordinates are longitude and latitude values encoded as decimal degrees.
WGS84, or The World Geodetic System of 1984 is the most commonly used geographic CRS. The difference between points in these systems varies up to 1 meter in continental US.
Many geospatial operations require you transform data to a common CRS before conducting spatial analysis or mapping.
An in-depth discussion of CRSs is outside the scope of this workshop. See Geocomputation in R for more information.
We can use plot to make a quick map the geometry stored in an sf spatial object.
plot(pop2010geo$geometry)
What do you get if you plot the sf object without specifying “$geometry”
Try it!
plot(pop2010geo)
The vast geographic extent and non-contiguous nature of the USA makes it difficult to map.
tidycensus includes a shift_geo parameter to shift AK & HI to below Texas.
pop2010geo_shifted <- get_decennial(geography = "state",
variables = c(pop10="P001001"),
output="wide",
year = 2010,
geometry=TRUE,
shift_geo=TRUE)
## Getting data from the 2010 decennial Census
## Using feature geometry obtained from the albersusa package
## Using Census Summary File 1
## Please note: Alaska and Hawaii are being shifted and are not to scale.
plot(pop2010geo_shifted$geometry)
You can save any sf data object to a shapefile using st_write
st_write(pop2010geo_shifted, "data_out/usa_pop2010_shifted.shp")
# Check to see if the data was written out to a shapefile
dir("data_out")
Use the sf plot command to make a map that color codes the geometry by the column values
plot(pop2010geo_shifted['pop10']) # a choropleth map!
ggplot(pop2010geo_shifted, aes(fill = pop10)) + geom_sf() # tells ggplot that geographic data are being plotted
Create a map of CA Population in 2010 by county
2010 pop Data for California Counties
#fetch it
cal_pop10 <- get_decennial(geography = "county",
variables = "P001001",
year = 2010,
state='CA',
geometry=TRUE)
# map it
plot(cal_pop10['value'])
We can fetch the census data and the geometry for more than one state with same function call
west_pop10 <- get_decennial(geography = "county",
variables = "P001001",
year = 2010,
state=c('CA', 'NV'),
geometry=T)
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
These are just quick plots to make sure we got the right data!
plot(west_pop10['value'])
Fetch and map the 2010 population by census tract for Alameda and Contra Costa counties.
Fetch Tract population & geometry data for Alameda & Contra Costa Counties
alcc_pop10 <- get_decennial(geography = "tract",
variables = "P001001",
year = 2010,
state='CA',
county=c("Alameda","Contra Costa"),
geometry=T)
## Getting data from the 2010 decennial Census
Map it
plot(alcc_pop10['value'])
Let’s use the 2010 census data to map the percent of San Francisco properties that were rented
To start, identify the variables for the
total number of housing units
number of renter occupied units
sf_rented <- get_decennial(geography = , # census tabulation unit
variables = , # number of households rented
year = ,
summary_var = , # Total households
state=,
county=,
geometry=)
sf_rented <- get_decennial(geography = "tract", # census tabulation unit
variables = "H004004", #number of households rented
year = 2010,
summary_var = "H004001", # Total households
state='CA',
county='San Francisco',
geometry=T)
## Getting data from the 2010 decennial Census
## Using Census Summary File 1
How to get the percent of units that were rented?
head(sf_rented)
## Simple feature collection with 6 features and 5 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -122.4267 ymin: 37.79121 xmax: -122.3996 ymax: 37.81144 ## geographic CRS: NAD83 ## # A tibble: 6 x 6 ## GEOID NAME variable value summary_value geometry ## <chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]> ## 1 06075… Census Tr… H004004 1707 2090 (((-122.4206 37.81111, -122.40… ## 2 06075… Census Tr… H004004 1830 2544 (((-122.425 37.811, -122.4242 … ## 3 06075… Census Tr… H004004 1492 2026 (((-122.4149 37.80354, -122.41… ## 4 06075… Census Tr… H004004 1741 2479 (((-122.4129 37.80218, -122.41… ## 5 06075… Census Tr… H004004 1792 2338 (((-122.4117 37.79629, -122.41… ## 6 06075… Census Tr… H004004 1418 1858 (((-122.4092 37.79204, -122.41…
sf_pct_rented <- sf_rented[sf_rented$value > 0,] %>%
mutate(pct = 100 * (value / summary_value))
# Take a look
head(sf_pct_rented)
## Simple feature collection with 6 features and 6 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -122.4267 ymin: 37.79121 xmax: -122.3996 ymax: 37.81144 ## geographic CRS: NAD83 ## # A tibble: 6 x 7 ## GEOID NAME variable value summary_value geometry pct ## <chr> <chr> <chr> <dbl> <dbl> <MULTIPOLYGON [°]> <dbl> ## 1 06075… Census… H004004 1707 2090 (((-122.4206 37.81111, -122… 81.7 ## 2 06075… Census… H004004 1830 2544 (((-122.425 37.811, -122.42… 71.9 ## 3 06075… Census… H004004 1492 2026 (((-122.4149 37.80354, -122… 73.6 ## 4 06075… Census… H004004 1741 2479 (((-122.4129 37.80218, -122… 70.2 ## 5 06075… Census… H004004 1792 2338 (((-122.4117 37.79629, -122… 76.6 ## 6 06075… Census… H004004 1418 1858 (((-122.4092 37.79204, -122… 76.3
plot(sf_pct_rented['pct'])
We can use tidycensus to fetch ACS data just like we fetched the decennial census data.
We will use the function get_acs instead of get_decennial
BUT it’s more complex workflow because
there are a lot more ACS tables and variables
Because the ACS contains sample data, each ACS variable that you retrieve with tidycensus will fetch both an estimate of the value and a margin of error.
Use the load_variables function to get a dataframe of all variables from the ACS 2015-2019 5 year dataset
Then View the dataset and filter for variables related to median household income
acs2019vars <- load_variables(year=2019, dataset = 'acs5', cache = T) # Review and filter the dataframe of ACS variables #View(acs2016vars)
Let’s fetch the median household income data for Alameda County
alco_mhhincome <- get_acs(geography='tract',
variables=c(median_hhincome = "B19013_001"),
year = 2019,
state='CA',
county='Alameda',
geometry=T
)
## Getting data from the 2015-2019 5-year ACS
head(alco_mhhincome)
## Simple feature collection with 6 features and 5 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -122.2887 ymin: 37.52248 xmax: -121.8779 ymax: 37.81562 ## geographic CRS: NAD83 ## GEOID NAME variable ## 1 06001442301 Census Tract 4423.01, Alameda County, California median_hhincome ## 2 06001437400 Census Tract 4374, Alameda County, California median_hhincome ## 3 06001437701 Census Tract 4377.01, Alameda County, California median_hhincome ## 4 06001402400 Census Tract 4024, Alameda County, California median_hhincome ## 5 06001402500 Census Tract 4025, Alameda County, California median_hhincome ## 6 06001450743 Census Tract 4507.43, Alameda County, California median_hhincome ## estimate moe geometry ## 1 110761 21966 MULTIPOLYGON (((-121.9701 3... ## 2 86210 9325 MULTIPOLYGON (((-122.0926 3... ## 3 64559 6732 MULTIPOLYGON (((-122.0747 3... ## 4 39913 8581 MULTIPOLYGON (((-122.284 37... ## 5 30000 12436 MULTIPOLYGON (((-122.2879 3... ## 6 128737 9289 MULTIPOLYGON (((-121.9066 3...
plot(alco_mhhincome['???'])
plot(alco_mhhincome['estimate'])
First define the set of variables of interest.
# Median Household income by Race - variables from ACS 2015-2019
inc_by_race <- c(All = "B19013_001",
White = "B19013H_001",
Black = "B19013B_001",
Asian = "B19013D_001",
Hispanic = "B19013I_001" )
Fetch census tract data for multiple variables at once
alco_mhhincome_by_race <- get_acs(geography='tract',
variables=inc_by_race,
year = 2019,
state='CA',
county='Alameda',
geometry=T )
## Getting data from the 2015-2019 5-year ACS
Facet maps make it easy to create visualizations of small multiples, or subsets of the data that facilitate comparisons. Here, we use ggplot to make multiple maps of income by race for Alameda County.
medhhinc_facet_map <- alco_mhhincome_by_race %>%
ggplot(aes(fill = estimate)) +
facet_wrap(~variable) +
geom_sf(color=NA) +
scale_fill_viridis_c()
medhhinc_facet_map
…because sometimes you don’t want tidy format
alco_mhhincome_by_race2 <- get_acs(geography='tract',
variables=inc_by_race,
year = 2019,
state='CA',
county='Alameda',
geometry=T,
output="wide")
## Getting data from the 2015-2019 5-year ACS
head(alco_mhhincome_by_race2)
## Simple feature collection with 6 features and 12 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -122.2887 ymin: 37.52248 xmax: -121.8779 ymax: 37.81562 ## geographic CRS: NAD83 ## GEOID NAME AllE AllM ## 1 06001442301 Census Tract 4423.01, Alameda County, California 110761 21966 ## 2 06001437400 Census Tract 4374, Alameda County, California 86210 9325 ## 3 06001437701 Census Tract 4377.01, Alameda County, California 64559 6732 ## 4 06001402400 Census Tract 4024, Alameda County, California 39913 8581 ## 5 06001402500 Census Tract 4025, Alameda County, California 30000 12436 ## 6 06001450743 Census Tract 4507.43, Alameda County, California 128737 9289 ## WhiteE WhiteM BlackE BlackM AsianE AsianM HispanicE HispanicM ## 1 87686 27850 NA NA 132071 19754 104336 26940 ## 2 83417 11963 107656 37219 122692 17395 75645 9909 ## 3 74000 35763 58000 29756 62262 22638 64375 8638 ## 4 137938 69610 31989 19980 11818 4976 NA NA ## 5 NA NA 20556 5948 53523 51287 NA NA ## 6 109671 26749 NA NA 131350 15507 136250 42576 ## geometry ## 1 MULTIPOLYGON (((-121.9701 3... ## 2 MULTIPOLYGON (((-122.0926 3... ## 3 MULTIPOLYGON (((-122.0747 3... ## 4 MULTIPOLYGON (((-122.284 37... ## 5 MULTIPOLYGON (((-122.2879 3... ## 6 MULTIPOLYGON (((-121.9066 3...
Make a map of MEDIAN GROSS RENT in Alameda and Contra Costa Counties by tract using data from the ACS 2015-2019 5 year product
alcc_medrent <- get_acs(geography= ,
variables= ,
year = ,
state= ,
county= ,
geometry=)
alcc_medrent <- get_acs(geography="tract",
variables=c(median_rent2019="B25064_001"),
year =2019,
state="CA",
county=c("Alameda","Contra Costa"),
geometry=T)
## Getting data from the 2015-2019 5-year ACS
# Uncomment to view map #plot(alcc_medrent['estimate'])
Interactive mapping gives the RStudio environment some of the functionality of desktop GIS.
There are a number of R packages tat you can use, including:
mapview: quick interactive exploratory data viewing
tmap: great static and interactive maps
Leaflet: highly customizable interactive maps
All of these are based on the Leaflet Javascript Library.
Let’s use mapview to make some quick interactive maps of our median hhousehold income data
mapview(alco_mhhincome_by_race2)
mapview(alco_mhhincome_by_race2, zcol="AllE")
Use Mapview to create a map of median household income (alcc_medrent)
mapview(alcc_medrent, zcol='estimate')
ACS variables can be confusing.
Some ways to identify the best variables to explore:
Web search, especially Census web resources
The Census Reporter website (https://censusreporter.org) provides another tool for navigating topics, tables, and variable names.
The NHGIS website (nhgis.org) is a great way to browse variables of interest
We haven’t talked about it but it may be important in your work with ACS data.
Math is needed to combine MOEs when you combine variables.
See this web page on how to handle MOEs in tidycensus
tidycensus offers two key functions for fetching census tabular and geographic: get_acs and get_decennial
Using tidycensus to fetch the tabular data or both tabular and geographic data is IMO way easier than any alternatives, IF you (1) know R, (2) know a bit about working with geographic data in R.
You can greatly enhance your maps if you make them with ggplot2 rather than the default plot command.
Interactive mapping greatly enhances your ability to do exploratory data analysis in RStudio.
Related D-Lab Workshops
Great online resource for working with spatial data in R